task planner
High-Performance Dual-Arm Task and Motion Planning for Tabletop Rearrangement
Zhang, Duo, Huang, Junshan, Yu, Jingjin
Abstract-- We propose Synchronous Dual-Arm Rearrangement Planner (SDAR), a task and motion planning (T AMP) framework for tabletop rearrangement, where two robot arms equipped with 2-finger grippers must work together in close proximity to rearrange objects whose start and goal configurations are strongly entangled. T o tackle such challenges, SDAR tightly knit together its dependency-driven task planner (SDAR-T) and synchronous dual-arm motion planner (SDAR-M), to intelligently sift through a large number of possible task and motion plans. Specifically, SDAR-T applies a simple yet effective strategy to decompose the global object dependency graph induced by the rearrangement task, to produce more optimal dual-arm task plans than solutions derived from optimal task plans for a single arm. Leveraging state-of-the-art GPU SIMD-based motion planning tools, SDAR-M employs a layered motion planning strategy to sift through many task plans for the best synchronous dual-arm motion plan while ensuring high levels of success rate. Comprehensive evaluation demonstrates that SDAR delivers a 100% success rate in solving complex, non-monotone, long-horizon tabletop rearrangement tasks with solution quality far exceeding the previous state-of-the-art. Experiments on two UR-5e arms further confirm SDAR directly and reliably transfers to robot hardware. Task and motion planning (T AMP) [1] represents a fundamental computation challenge in robotics, in which a robot system, e.g., one or more robot arms, must break down a given, potentially long-horizon task into suitable "bite-sized" sub-tasks that can be executed through short-horizon robot motions.
- North America > United States > New Jersey > Middlesex County > Piscataway (0.14)
- Europe > Germany > Berlin (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
AdaptPNP: Integrating Prehensile and Non-Prehensile Skills for Adaptive Robotic Manipulation
Zhu, Jinxuan, Tie, Chenrui, Cao, Xinyi, Wang, Yuran, Guo, Jingxiang, Chen, Zixuan, Chen, Haonan, Chen, Junting, Xiao, Yangyu, Wu, Ruihai, Shao, Lin
Abstract-- Non-prehensile (NP) manipulation, in which robots alter object states without forming stable grasps (for example, pushing, poking, or sliding), significantly broadens robotic manipulation capabilities when grasping is infeasible or insufficient. However, enabling a unified framework that generalizes across different tasks, objects, and environments while seamlessly integrating non-prehensile and prehensile (P) actions remains challenging: robots must determine when to invoke NP skills, select the appropriate primitive for each context, and compose P and NP strategies into robust, multi-step plans. We introduce AdaptPNP, a vision-language model (VLM)-empowered task and motion planning framework that systematically selects and combines P and NP skills to accomplish diverse manipulation objectives. Our approach leverages a VLM to interpret visual scene observations and textual task descriptions, generating a high-level plan skeleton that prescribes the sequence and coordination of P and NP actions. A digital-twin based object-centric intermediate layer predicts desired object poses, enabling proactive mental rehearsal of manipulation sequences. We evaluate AdaptPNP across representative P&NP hybrid manipulation tasks in both simulation and real-world environments. These results underscore the potential of hybrid P&NP manipulation as a crucial step toward general-purpose, human-level robotic manipulation capabilities. When manipulating objects to achieve desired configurations, robots typically rely on establishing stable grasps and transporting objects to target locations.
- Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.89)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Robots > Manipulation (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
Using VLM Reasoning to Constrain Task and Motion Planning
Yan, Muyang, Mengdibayev, Miras, Floros, Ardon, Guo, Weihang, Kavraki, Lydia E., Kingston, Zachary
In task and motion planning, high-level task planning is done over an abstraction of the world to enable efficient search in long-horizon robotics problems. However, the feasibility of these task-level plans relies on the downward refinability of the abstraction into continuous motion. When a domain's refinability is poor, task-level plans that appear valid may ultimately fail during motion planning, requiring replanning and resulting in slower overall performance. Prior works mitigate this by encoding refinement issues as constraints to prune infeasible task plans. However, these approaches only add constraints upon refinement failure, expending significant search effort on infeasible branches. We propose VIZ-COAST, a method of leveraging the common-sense spatial reasoning of large pretrained Vision-Language Models to identify issues with downward refinement a priori, bypassing the need to fix these failures during planning. Experiments on two challenging TAMP domains show that our approach is able to extract plausible constraints from images and domain descriptions, drastically reducing planning times and, in some cases, eliminating downward refinement failures altogether, generalizing to a diverse range of instances from the broader domain.
- Africa > Madagascar (0.04)
- North America > United States > Texas > Harris County > Houston (0.04)
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
A Task-Efficient Reinforcement Learning Task-Motion Planner for Safe Human-Robot Cooperation
Liu, Gaoyuan, de Winter, Joris, Merckaert, Kelly, Steckelmacher, Denis, Nowe, Ann, Vanderborght, Bram
In a Human-Robot Cooperation (HRC) environment, safety and efficiency are the two core properties to evaluate robot performance. However, safety mechanisms usually hinder task efficiency since human intervention will cause backup motions and goal failures of the robot. Frequent motion replanning will increase the computational load and the chance of failure. In this paper, we present a hybrid Reinforcement Learning (RL) planning framework which is comprised of an interactive motion planner and a RL task planner. The RL task planner attempts to choose statistically safe and efficient task sequences based on the feedback from the motion planner, while the motion planner keeps the task execution process collision-free by detecting human arm motions and deploying new paths when the previous path is not valid anymore. Intuitively, the RL agent will learn to avoid dangerous tasks, while the motion planner ensures that the chosen tasks are safe. The proposed framework is validated on the cobot in both simulation and the real world, we compare the planner with hard-coded task motion planning methods. The results show that our planning framework can 1) react to uncertain human motions at both joint and task levels; 2) reduce the times of repeating failed goal commands; 3) reduce the total number of replanning requests.
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Europe > Belgium > Flanders (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- (2 more...)
Robo-Troj: Attacking LLM-based Task Planners
Nahian, Mohaiminul Al, Altaweel, Zainab, Reitano, David, Ahmed, Sabbir, Zhang, Shiqi, Rakin, Adnan Siraj
Robots need task planning methods to achieve goals that require more than individual actions. Recently, large language models (LLMs) have demonstrated impressive performance in task planning. LLMs can generate a step-by-step solution using a description of actions and the goal. Despite the successes in LLM-based task planning, there is limited research studying the security aspects of those systems. In this paper, we develop Robo-Troj, the first multi-trigger backdoor attack for LLM-based task planners, which is the main contribution of this work. As a multi-trigger attack, Robo-Troj is trained to accommodate the diversity of robot application domains. For instance, one can use unique trigger words, e.g., "herical", to activate a specific malicious behavior, e.g., cutting hand on a kitchen robot. In addition, we develop an optimization method for selecting the trigger words that are most effective. Through demonstrating the vulnerability of LLM-based planners, we aim to promote the development of secured robot systems.
- North America > United States > New York > Broome County > Binghamton (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
REI-Bench: Can Embodied Agents Understand Vague Human Instructions in Task Planning?
Jiang, Chenxi, Zhou, Chuhao, Yang, Jianfei
Robot task planning decomposes human instructions into executable action sequences that enable robots to complete a series of complex tasks. Although recent large language model (LLM)-based task planners achieve amazing performance, they assume that human instructions are clear and straightforward. However, real-world users are not experts, and their instructions to robots often contain significant vagueness. Linguists suggest that such vagueness frequently arises from referring expressions (REs), whose meanings depend heavily on dialogue context and environment. This vagueness is even more prevalent among the elderly and children, who robots should serve more. This paper studies how such vagueness in REs within human instructions affects LLM-based robot task planning and how to overcome this issue. To this end, we propose the first robot task planning benchmark with vague REs (REI-Bench), where we discover that the vagueness of REs can severely degrade robot planning performance, leading to success rate drops of up to 77.9%. We also observe that most failure cases stem from missing objects in planners. To mitigate the REs issue, we propose a simple yet effective approach: task-oriented context cognition, which generates clear instructions for robots, achieving state-of-the-art performance compared to aware prompt and chains of thought. This work contributes to the research community of human-robot interaction (HRI) by making robot task planning more practical, particularly for non-expert users, e.g., the elderly and children.
Orchestrating Agents and Data for Enterprise: A Blueprint Architecture for Compound AI
Kandogan, Eser, Bhutani, Nikita, Zhang, Dan, Chen, Rafael Li, Gurajada, Sairam, Hruschka, Estevam
Large language models (LLMs) have gained significant interest in industry due to their impressive capabilities across a wide range of tasks. However, the widespread adoption of LLMs presents several challenges, such as integration into existing applications and infrastructure, utilization of company proprietary data, models, and APIs, and meeting cost, quality, responsiveness, and other requirements. To address these challenges, there is a notable shift from monolithic models to compound AI systems, with the premise of more powerful, versatile, and reliable applications. However, progress thus far has been piecemeal, with proposals for agentic workflows, programming models, and extended LLM capabilities, without a clear vision of an overall architecture. In this paper, we propose a 'blueprint architecture' for compound AI systems for orchestrating agents and data for enterprise applications. In our proposed architecture the key orchestration concept is 'streams' to coordinate the flow of data and instructions among agents. Existing proprietary models and APIs in the enterprise are mapped to 'agents', defined in an 'agent registry' that serves agent metadata and learned representations for search and planning. Agents can utilize proprietary data through a 'data registry' that similarly registers enterprise data of various modalities. Tying it all together, data and task 'planners' break down, map, and optimize tasks and queries for given quality of service (QoS) requirements such as cost, accuracy, and latency. We illustrate an implementation of the architecture for a use-case in the HR domain and discuss opportunities and challenges for 'agentic AI' in the enterprise.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- Workflow (0.51)
- Research Report (0.50)
- Overview (0.46)
Grasping in Uncertain Environments: A Case Study For Industrial Robotic Recycling
Daniels, Annalena, Kerz, Sebastian, Bari, Salman, Gabler, Volker, Wollherr, Dirk
Autonomous robotic grasping of uncertain objects in uncertain environments is an impactful open challenge for the industries of the future. One such industry is the recycling of Waste Electrical and Electronic Equipment (WEEE) materials, in which electric devices are disassembled and readied for the recovery of raw materials. Since devices may contain hazardous materials and their disassembly involves heavy manual labor, robotic disassembly is a promising venue. However, since devices may be damaged, dirty and unidentified, robotic disassembly is challenging since object models are unavailable or cannot be relied upon. This case study explores grasping strategies for industrial robotic disassembly of WEEE devices with uncertain vision data. We propose three grippers and appropriate tactile strategies for force-based manipulation that improves grasping robustness. For each proposed gripper, we develop corresponding strategies that can perform effectively in different grasping tasks and leverage the grippers design and unique strengths. Through experiments conducted in lab and factory settings for four different WEEE devices, we demonstrate how object uncertainty may be overcome by tactile sensing and compliant techniques, significantly increasing grasping success rates.
Non-Prehensile Tool-Object Manipulation by Integrating LLM-Based Planning and Manoeuvrability-Driven Controls
Lee, Hoi-Yin, Zhou, Peng, Duan, Anqing, Ma, Wanyu, Yang, Chenguang, Navarro-Alarcon, David
The ability to wield tools was once considered exclusive to human intelligence, but it's now known that many other animals, like crows, possess this capability. Yet, robotic systems still fall short of matching biological dexterity. In this paper, we investigate the use of Large Language Models (LLMs), tool affordances, and object manoeuvrability for non-prehensile tool-based manipulation tasks. Our novel method leverages LLMs based on scene information and natural language instructions to enable symbolic task planning for tool-object manipulation. This approach allows the system to convert the human language sentence into a sequence of feasible motion functions. We have developed a novel manoeuvrability-driven controller using a new tool affordance model derived from visual feedback. This controller helps guide the robot's tool utilization and manipulation actions, even within confined areas, using a stepping incremental approach. The proposed methodology is evaluated with experiments to prove its effectiveness under various manipulation scenarios.
- Asia > China > Hong Kong (0.06)
- Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
- Asia > Singapore (0.04)
- (7 more...)
One to rule them all: natural language to bind communication, perception and action
Colombani, Simone, Ognibene, Dimitri, Boccignone, Giuseppe
In recent years, research in the area of human-robot interaction has focused on developing robots capable of understanding complex human instructions and performing tasks in dynamic and diverse environments. These systems have a wide range of applications, from personal assistance to industrial robotics, emphasizing the importance of robots interacting flexibly, naturally and safely with humans. This paper presents an advanced architecture for robotic action planning that integrates communication, perception, and planning with Large Language Models (LLMs). Our system is designed to translate commands expressed in natural language into executable robot actions, incorporating environmental information and dynamically updating plans based on real-time feedback. The Planner Module is the core of the system where LLMs embedded in a modified ReAct framework are employed to interpret and carry out user commands. By leveraging their extensive pre-trained knowledge, LLMs can effectively process user requests without the need to introduce new knowledge on the changing environment. The modified ReAct framework further enhances the execution space by providing real-time environmental perception and the outcomes of physical actions. By combining robust and dynamic semantic map representations as graphs with control components and failure explanations, this architecture enhances a robot adaptability, task execution, and seamless collaboration with human users in shared and dynamic environments. Through the integration of continuous feedback loops with the environment the system can dynamically adjusts the plan to accommodate unexpected changes, optimizing the robot ability to perform tasks. Using a dataset of previous experience is possible to provide detailed feedback about the failure. Updating the LLMs context of the next iteration with suggestion on how to overcame the issue.
- Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
- Europe > Monaco (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)